Skip to main content

Creating a Dataset in Dataset Explorer

Dataset Explorer lets you create a collection that Corva apps can write to, read from, query, and sometimes subscribe to in real time. The most important decision is the dataset type. Most custom datasets are Time-Based, Depth-Based, or Reference. Some environments also show Time-series as a specialized option.

Choose the type by asking what makes one record different from the next one:

If records are ordered by...ChooseTypical examples
TimeTime-BasedWITS summaries, frac stage measurements, task app outputs with timestamps, app-calculated time series
Measured depthDepth-BasedMWD depth logs, depth-indexed drilling calculations, curves sampled every foot
Neither time nor depthReferenceLookup tables, user-entered configuration, catalog data, notes, mappings
A workflow explicitly requires Time-seriesTime-seriesSpecialized platform or integration workflows that name this dataset type
danger

Do not choose a dataset type because of a feature you hope to use later. Type should describe the record's natural index. Capabilities such as copying records during reruns, publishing WebSocket events, or generating summaries should be handled as dataset or app configuration when that configuration is available.

Before You Create the Dataset

Write down these four decisions first:

  1. What is the natural index? Time, measured depth, or neither.
  2. Is the data asset-specific? If the records belong to a well, rig, or other asset, include asset_id.
  3. Will apps need live updates? If yes, use a time or depth stream shape and confirm the dataset/app configuration supports WebSocket publication.
  4. Will reruns need this data? If the data must be copied or maintained with an asset during reruns, capture that requirement as dataset configuration instead of changing the dataset type to work around it.

If you are unsure between two types, use this rule: pick the type that matches how the app will query and append records most often.

Time-Based Datasets

Use a time-based dataset when each record represents a value, event, state, or summary at a timestamp. This is the most common choice for stream apps and scheduled apps that follow time data.

Good use cases:

  • A drilling stream app saves calculated values every second or every minute.
  • A completion app saves frac measurements by timestamp and optionally stage_number.
  • A task app stores historical time-series output that a frontend app will plot.
  • A frontend app needs to query or subscribe to values over a time range.
  • A Dev Center dataset needs to be visualized in the Corva Drilling Traces app.
  • An alert needs to evaluate values over time.

Do not use a time-based dataset when the record is fundamentally indexed by measured depth. If the question users ask is "what happened at this depth?", use depth-based.

Required record shape:

{
"timestamp": 1741786564,
"company_id": 1,
"asset_id": 123456,
"version": 1,
"provider": "my-company",
"collection": "company_name#pump-health",
"data": {
"pressure": 8421.3,
"rate": 74.2,
"state": "Pumping"
}
}

Common optional fields:

{
"metadata": {
"pressure_unit": "psi",
"rate_unit": "bpm"
},
"app_key": "my-company.pump-health",
"app_version": 3,
"stage_number": 28
}

Example Dataset Explorer query:

{
"timestamp": { "$gte": 1741786564 },
"data.state": { "$in": ["Rotary Drilling", "Slide Drilling"] }
}

Depth-Based Datasets

Use a depth-based dataset when each record represents a value at a measured depth. Depth-based datasets are usually produced by backend apps that follow depth streams.

Good use cases:

  • A drilling stream depth app follows Log Type: Depth.
  • A scheduled depth app calculates values for a depth interval.
  • A frontend app plots values against measured depth instead of time.
  • Multiple depth logs exist on the same asset and records must stay scoped to the correct log.

Do not use a depth-based dataset for time-series data just because the asset is a well. If records arrive every second, every minute, or by event time, use time-based.

Required record shape:

{
"measured_depth": 6025,
"log_identifier": "751a6e120a86",
"company_id": 1,
"asset_id": 123456,
"version": 1,
"provider": "my-company",
"collection": "company_name#formation-quality",
"data": {
"gamma_ray": 82.4,
"rop": 127.5
}
}

log_identifier identifies the depth stream that produced the data. A single well can have multiple depth streams, such as MWD formation evaluation, MWD mechanical, mud properties, gas data, or drilling depth data.

Common optional fields:

{
"metadata": {
"gamma_ray_unit": "gAPI",
"rop_unit": "ft/hr"
},
"app_key": "my-company.formation-quality",
"app_version": 2
}

Example Dataset Explorer query:

{
"log_identifier": "751a6e120a86",
"measured_depth": { "$gte": 6025 }
}

Reference Datasets

Use a reference dataset when records are not naturally ordered by time or measured depth. Think of a reference dataset like a lookup table or a durable set of business records.

Good use cases:

  • User-entered configuration for an app.
  • A mapping between external IDs and Corva IDs.
  • A list of allowed thresholds, labels, tools, or categories.
  • Notes or handover records that are searched by fields in data.

Do not use a reference dataset for stream or depth data. Reference datasets are also not the right choice if you need Corva's custom WebSocket subscription flow for live time/depth updates.

Required record shape:

{
"company_id": 1,
"version": 1,
"provider": "my-company",
"collection": "company_name#activity-labels",
"data": {
"activity_code": "SLIDE",
"display_name": "Slide Drilling",
"color": "#4D8CFF"
}
}

Reference records can include asset_id, timestamp, measured_depth, metadata, app_key, or app_version when those fields are useful, but adding those fields does not make the dataset time-based or depth-based. If the records will mainly be queried by timestamp or measured_depth, choose the matching dataset type instead.

Example Dataset Explorer query:

{
"data.display_name": { "$in": ["Slide Drilling", "Rotary Drilling"] }
}

What About Time-Series?

Dataset Explorer may show Time-series as a fourth dataset type. Do not treat it as another name for Time-Based.

For normal custom app data that has a timestamp, choose Time-Based. Choose Time-series only when the product workflow, platform team, or template you are following explicitly requires the Time-series dataset type.

Examples:

SituationChoose
A backend app saves calculated values every minuteTime-Based
A frontend app plots custom records over a time rangeTime-Based
A workflow or template specifically says to create a Time-series datasetTime-series
You want min, max, average, or median summariesNot a type decision; handle this as summary configuration when available
You want records copied during reruns or WebSocket events publishedNot a type decision; handle this as dataset/app configuration when available

If the only reason you are considering Time-series is "the data changes over time", use Time-Based.

Dataset Capabilities Are Separate From Dataset Type

Dataset type answers one question: what indexes the records?

Other behavior should be considered separately:

CapabilityWhat to decide
Rerun behaviorShould records be copied, recreated, or preserved when an asset is rerun?
WebSocket eventsShould new or updated records publish live events to frontend subscribers?
Summary valuesShould simple summaries such as min, max, average, or median be generated from this dataset?
Asset lifecycleShould records move with an asset, be deleted with an asset, or remain as company-level reference data?

When these capabilities are available in the dataset or app configuration, configure them there. Do not create the wrong dataset type to get a lifecycle behavior, a subscription behavior, or a summary behavior.

Create the Dataset

  1. Open Dev Center.
  2. Select Dataset Explorer in the top navigation.
  3. Select the dataset type: Time-Based, Depth-Based, Reference, or Time-series if a workflow explicitly requires it.

Select dataset type

  1. Click + NEW DATASET.

New dataset

  1. Fill in only the dataset fields you need for creation:
{
"name": "company_name#pump-health",
"description": "Calculated pump health values by timestamp",
"data_type": "time",
"indexes": [],
"schema": {}
}

Use a clear name because it becomes the collection key used by apps and API requests. Dataset names must start with the company prefix followed by #:

company_name#short-purpose-name

Examples:

  • company_name#pump-health
  • company_name#formation-quality
  • company_name#activity-labels
  1. Leave indexes and schema empty unless you have a specific schema/index requirement and understand how it will affect writes and queries.
  2. Click Save.

Quick Choice Examples

ScenarioCorrect typeWhy
Save WITS-derived values every minuteTime-BasedRecords are appended by timestamp.
Save gamma ray every footDepth-BasedRecords are appended by measured_depth and scoped by log_identifier.
Store app threshold settingsReferenceThe records are configuration, not a time/depth series.
Store frac stage metrics over timeTime-BasedThe primary index is timestamp; stage_number is additional context.
Store a lookup table of activity names and colorsReferenceUsers query by fields such as data.activity_code.
Store calculated values for a depth intervalDepth-BasedThe values belong to a measured-depth range.
Follow a template that explicitly requires Time-seriesTime-seriesThe workflow names that dataset type directly.

Common Mistakes

  • Using Reference for asset data that changes over time. Use time-based if users will query by time range, plot historical values, or subscribe to live updates.
  • Using Time-Based for depth curves. Use depth-based when the same timestamp can contain multiple depth readings or when depth is the natural axis.
  • Ignoring log_identifier on depth data. Without it, records from different depth streams on the same asset can be mixed together.
  • Using Time-series just because the data has timestamps. Most timestamped custom app data should be Time-Based unless a workflow specifically requires Time-series.
  • Putting units inside field names. Prefer metadata for units so field names stay stable.
  • Changing dataset type to solve rerun or WebSocket behavior. Treat those as capabilities/configuration requirements.
  • Choosing a vague collection name. A name like company_name#data becomes hard to maintain once multiple apps use it.

Final Check

Before saving, make sure this sentence is true:

This dataset is Time-Based / Depth-Based / Reference / Time-series because each record is naturally identified by timestamp / measured depth / business key / an explicit Time-series workflow requirement.

If that sentence is awkward, the dataset type probably needs another look before records are written.